* Load Compustat data
sysuse comp_ann, clear
keep gvkey fyear lpermno sich prcc_f csho at ceq sale dlc dltt
rename fyear year
rename lpermno permno

* Merge with patent data
merge 1:1 permno year using patents
rename year fyear
drop if _merge==2 
drop _merge

* Computing industry definitions
bys gvkey: egen sic=median(sich)
replace sic=floor(sic)

* Generate variables
xtset gvkey fyear
g size=log(at)
g mktCap=prcc_f*csho
g mtb=(at-ceq+mktCap)/at
g mktLevRaw=(dlc+dltt)/(dlc+dltt+mktCap)
g incidence=1 if fnpats>0
replace incidence=0 if missing(fnpats)

* Sample selection and filtering
keep if fyear>=1975 & fyear<2010
drop if (sic>=9000) | ((sic>=4900) & (sic<=4999)) | ((sic>=6000) & (sic<=6999))
drop if ceq<0 | mtb>20 | missing(at) | missing(sale) 

* Winsorizing
foreach var in mktLev {
	winsor2 `var'Raw, replace cuts(2 98) by(fyear)
	gen `var'Orig=`var'Raw
	bysort gvkey: egen mean`var'Raw = mean(`var'Raw) 
	egen `var'Rawsamplemean = mean(`var'Raw)
	gen `var' = `var'Raw - mean`var'Raw + `var'Rawsamplemean
}

* Data for Table 2
keep gvkey fyear incidence mktLev
tabstat mktLev, s(mean var)
tabstat incidence, s(mean)